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Abstract — Recent work has demonstrated that using a carefully 
designed sensing matrix rather than a random one, can improve 
the performance of compressed sensing. In particular, a well- 
designed sensing matrix can reduce the coherence between the 
atoms of the equivalent dictionary, and as a consequence, reduce 
the reconstruction error. In some applications, the signals of 
interest can be well approximated by a union of a small number 
of subspaces (e.g., face recognition and motion segmentation). 
This implies the existence of a dictionary which leads to block- 
sparse representations. In this work, we propose a framework 
for sensing matrix design that improves the ability of block- 
sparse approximation techniques to reconstruct and classify 
signals. This method is based on minimizing a weighted sum 
of the inter-block coherence and the sub-block coherence of the 
equivalent dictionary. Our experiments show that the proposed 
algorithm significantly improves signal recovery and classification 
ability of the Block-OMP algorithm compared to sensing matrix 
optimization methods that do not employ block structure. 

I. Introduction 

The framework of compressed sensing aims at recovering 
an unknown vector x G R N from an under-determined 
system of linear equations y = Ax, where A G rMxN 
is a sensing matrix, and y G R M is an observation vector 
with M < N. Since the system is under-determined, x can 
not be recovered without additional information. In 0, E) 
it was shown that when x is known to have a sufficiently 
sparse representation, and when A is randomly generated, 
x can be recovered uniquely with high probability from the 
measurements y. More specifically, the assumption is that x 
can be represented as x = DO for some orthogonal dictionary 
D G R NxN , where 9 G R N is sufficiently sparse. The vector 
x can then be recovered regardless of D and irrespective 
of the locations of the nonzero entries of 9. This can be 
achieved by approximating the sparsest representation 9 using 
methods such as Basis Pursuit (BP) 0, U] and Orthogonal 
Matching Pursuit (OMP) 0, 0. In practice, overcomplete 
dictionaries D G R NxK with K > N lead to improved sparse 
representations and are better suited for most applications. 
Therefore, we treat the more general case of overcomplete 
dictionaries in this paper. 

A simple way to characterize the recovery ability of sparse 
approximation algorithms was presented in j4), using the 
coherence between the columns of the equivalent dictionary 
E = AD. When the coherence is sufficiently low, OMP and 
BP are guaranteed to recover the sparse vector 9. Accordingly, 
recent work 0, 0, has demonstrated that designing a 
sensing matrix such that the coherence of E is low improves 
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the ability to recover 9. The proposed methods yield good 
results for general sparse vectors. 

In some applications, however, the representations have a 
unique sparsity structure that can be exploited. Our interest 
is in the case of signals that are drawn from a union of a 
small number of subspaces 0, iflOl . ifTTl . Ifl2l . This occurs 
naturally, for example, in face recognition |[T3l . Ifl4l . motion 
segmentation lfT31 . multi-band signals lfl6l . IfTTl . |fl~8), mea- 
surements of gene expression levels Qjp , and more. For such 
signals, sorting the dictionary atoms according to the underly- 
ing subspaces leads to sparse representations which exhibit a 
block-sparse structure, i.e., the nonzero coefficients in 9 occur 
in clusters of varying sizes. Several methods, such as Block- 
BP (BBP) E2), ED), ED and Block-OMP (BOMP) E2, E3 
have been proposed to take advantage of this block structure 
in recovering the block-sparse representations 9. Bounds on 
the recovery performance were presented in fT2l based on 
the block restricted isometry property (RIP), and in |'22) using 
appropriate coherence measures. In particular, it was shown in 
ll22l that under conditions on the inter-block coherence (i.e., 
the maximal coherence between two blocks) and the sub-block 
coherence (i.e., the maximal coherence between two atoms in 
the same block) of the equivalent dictionary E, Block-OMP 
is guaranteed to recover the block-sparse vector 9. 

In this paper we propose a method for designing a sensing 
matrix, assuming that a block-sparsifying dictionary is pro- 
vided. A method for learning a block-sparsifying dictionary 
is developed in 11241 . Our approach improves the recovery 
ability of block-sparse approximation algorithms by targeting 
the Gram matrix of the equivalent dictionary, an approach 
similar in spirit to that of 0, 0. While and targeted 
minimization of the coherence between atoms, our method, 
which will be referred to as Weighted Coherence Minimization 
(WCM), aims at reducing a weighted sum of the inter-block 
coherence and the sub-block coherence. 

It turns out that the weighted coherence objective is hard to 
minimize directly. To derive an efficient algorithm, we use the 
bound-optimization method, and replace our objective with an 
easier to minimize surrogate function that is updated in each 
optimization step 11251 . We develop a closed form solution for 
minimizing the surrogate function in each step, and prove that 
its iterative minimization is guaranteed to converge to a local 
solution of the original problem. 

Our experiments reveal that minimizing the sub-block co- 
herence is more important than minimizing the inter-block 
coherence. By giving more weight to minimizing the sub-block 
coherence, the proposed algorithm yields sensing matrices that 
lead to equivalent dictionaries with nearly orthonormal blocks. 
Simulations show that such sensing matrices significantly 
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improve signal reconstruction and signal classification results 
compared to previous approaches that do not employ block 
structure. 

We begin by reviewing previous work on sensing matrix 
design in Section[n] In Section Hill we introduce our definitions 
of total inter-block coherence and total sub-block coherence. 
We then present the objective for sensing matrix design, and 
show that it can be considered as a direct extension of the 
one used in (8) to the case of blocks. We present the WCM 
algorithm for minimizing the proposed objective in Section [IV] 
and prove its convergence in Appendix A. We evaluate the 
performance of the proposed algorithm and compare it to 
previous work in Section M 

Throughout the paper, we denote vectors by lowercase 
letters, e.g., x, and matrices by uppercase letters, e.g., A. A' 
is the transpose of A. The jth column of the matrix A is Aj, 
and the ith row is A 1 . The entry of A in the row with index i 
and the column with index j is A), We define the Frobenius 



norm by \\A\\f = yJ2j 1 1 A? Ill' an ^ th e Z p -norm of a vector 
x by The /o- norm IMIo counts the number of non-zero 

entries in x. We denote the identity matrix by / or I s when the 
dimension is not clear from the context. The largest eigenvalue 
of the positive-semidefinite matrix B'B is written as A max (-B). 

II. Prior work on sensing matrix design 

The goal of sensing matrix design is to construct a sensing 
matrix A G fi MxN with M < N that improves the recovery 
ability for a given sparsifying dictionary D G R Nx with 
K > N . In other words, A is designed to improve the ability 
of sparse approximation algorithms such as BP and OMP to 
recover the sparsest representation 9 from 



y = ADO = E6, 



(1) 



where E is the equivalent dictionary. In this section we 
briefly review the sensing matrix design method introduced by 
Duarte-Carvajalino and Sapiro (8). Their algorithm was shown 
to provide significant improvement in reconstruction success. 

The motivation to design sensing matrices stems from the 
theoretical work of j4), where it was shown that BP and OMP 
succeed in recovering 9 when the following condition holds: 



\\e\\ <Ui + - 

2 V M 
Here /i is the coherence defined by: 



H = max ■ 



(2) 



(3) 



^ l|£i||2||£,-||2 

The smaller fi, the higher the bound on the sparsity of 6. 
Since E is overcomplete, and as a consequence not orthogonal, 
/j, will always be strictly positive. Condition © is a worst- 
case bound and does not reflect the average recovery ability 
of sparse approximation methods. However, it does suggest 
that recovery may be improved when E is as orthogonal as 
possible. 

Motivated by these observations, Duarte-Carvajalino and 
Sapiro |8| proposed designing a sensing matrix A by mini- 



mizing \\E'E — I\\ F , This problem can be written as: 

min \\E'E - I\\ 2 F = min \\D'A'AD - I\\%. (4) 

It is important to note that rather than minimizing /i, © 
minimizes the sum of the squared inner products of all pairs 
of atoms in E, referred to as the total coherence /z': 

M *=^(^) 2 . (5) 

At the same time, solving © keeps the norms of the atoms 
close to 1. 

While an approximate solution to © has already been 
presented in (8), we provide an exact solution that will be of 
use in the next sections. To solve ©, we rewrite its objective 
using the well-known relation between the Frobenius norm 
and the trace, ||C||| = tr(CC*'): 

\\E'E - I K \\ 2 F =tr(E'EE'E - 2E'E + I K ) 

=\x(EE'EE' - 2EE' + I M ) + {K - M) 

= \\EE' - I M f F + {K - M) 

= \\ADD'A'- I M \\ F + {K-M). (6) 

Since the first term in © is always positive, the objective of 
© is lower bounded by \\E'E - I\\% > K - M. 

From © it follows that minimizing (0]i is equivalent to the 
minimization of \\ADD' A' —Im\\f- A solution to this problem 
can be achieved in closed form as follows. Let UAU' be the 
eigenvalue decomposition of DD', and let Tm x n = AUK 1 / 2 . 
Then, © is equivalent to: 



mjn||rr'-/|H. 



(7) 



This problem is solved by choosing T to be any matrix with 
orthonormal rows, such as T = [Im 0], leading to TV = I. 
The optimal sensing matrix is then given by A = rA _1 / 2 [/'. 
Here, and throughout the paper, we assume that D has full row 
rank, guaranteeing that A is invertible. Note that the global 
minimum of the objective in © equals K — M. The benefits 
of using such a sensing matrix were shown empirically in J8). 

The same solution is obtained by setting the derivative of 
© equal to zero: 



2E^_M = 4 ( rr'r-r, 







(8) 



It can be deduced from © that for stationary points, the 
singular values of T must be equal to either one or zero. 
However, only when all the M singular values of T equal one, 
i.e., r has full row rank, we have a local minimum (the other 
stationary points being a local maximum and saddle points). 
It is important to keep in mind that even though the objective 
is not convex, every local minimum is a global minimum as 
well. 

III. Sensing matrix design for block-sparse 

DECODING 

The design of a sensing matrix according to (8) does not 
take advantage of block structure in the sparse representations 
of the data. In this section we formulate the problem of sensing 
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matrix design for block-sparse decoding. We first introduce the 
basic concepts of block-sparsity, and then present an objective 
which can be seen as an extension of (0]i to the case of block- 
sparse decoding. 

A. Block-sparse decoding 

The framework of block-sparse decoding aims at recovering 
an unknown vector x £ R N from an under-determined system 
of linear equations y = Ax, where A € R AIxN is a sensing 
matrix, and y £ R M is an observation vector with M < N. 
The difference with sparse recovery lies in the assumption 
that x has a sufficiently block-sparse representation 9 € R N 
with respect to some orthogonal block-sparsifying dictionary 
D G R NxN . The vector x can then be recovered by approxi- 
mating the block-sparsest representation corresponding to the 
measurements y using methods such as Block-BP (BBP) lfl2l . 
ED, ED and Block-OMP (BOMP) l22l. |23l. 

A block-sparsifying dictionary D is a dictionary whose 
atoms are sorted in blocks which enable block-sparse rep- 
resentations for a set of signals. We can represent fl as a 
concatenation of B column-blocks D[j] of size N x Sj, where 
Sj is the number of atoms belonging to the jth block: 

D = [D[l] D[2] ... D[B]]. 

Similarly, we view the representation 9 as a concatenation of 
B blocks 9[j] of length Sj'. 

9= [0[1] 9[2] ... 9[B}}'. 

We say that a representation 9 is fc-block-sparse if its nonzero 
values are concentrated in k blocks only. This is denoted by 

||0||a,o < k, where 

B 

u#iko = £/(ira 2 >o). 

The indicator function /(•) counts the number of blocks in 9 
with nonzero Euclidean norm. 

B. Problem definition 

For a given block-sparsifying dictionary D G R NxK with 
K > N, we wish to design a sensing matrix A G R MxN that 
improves the recovery ability of block-sparse approximation 
algorithms. Note that we allow D to be overcomplete. 

A performance bound on the recovery success of block- 
sparse signals has been developed in Il22l for the case of a dic- 
tionary D with blocks of a fixed size s (i.e., Sj = Sj = s) and 
an equivalent dictionary E = AD with normalized columns. 
The bound is a function of the Gram matrix G G R K x of the 
equivalent dictionary, defined as E'E. The (i, j)th block of G, 
E[i]'E[j], is denoted by G[i,j] G i? s * xs ^. The (i, j)th block of 
any other K x K matrix will be denoted similarly. It was shown 
in l22l that BBP and BOMP succeed in recovering the block 
sparsest representation 9 corresponding to the measurements 
y = E9 when the following condition holds: 

\\0\ko<^(^ B 1 +s-(s-l)—). (9) 
Zs \ Mb/ 



Here 

Hb = max - v / A ma x(G[i, j]'G[i, j}) 

is the inter-block coherence and 

v = max max \(G\j,j])%\ 

is the sub-block coherence. The inter-block coherence /is is 
a generalization of the coherence fi, and describes the global 
properties of the equivalent dictionary. More specifically, fi B 
measures the cosine of the minimal angle between two blocks 
in E. The sub-block coherence v describes the local properties 
of the dictionary, by measuring the cosine of the minimal angle 
between two atoms in the same block in E. Note, that when 
s = 1, (|9) reduces to the bound in the sparse case (0. The 
term fig 1 in (0 suggests that \ib needs to be reduced in order 
to loosen the bound. On the other hand, the term — (s — 1) — 
implies that the ratio — should be small. This leads to a 
trade-off between minimizing fi B an d minimizing v to loosen 
the bound, which is reflected in the sensing matrix design 
objective presented later in this section. 

Condition (|9]l is a worst case bound and does not represent 
the average recovery ability of block-sparse approximation 
methods. It does suggest, however, that in order to improve 
the average recovery, all pairs of blocks in E should be as 
orthogonal as possible and also all pairs of atoms within each 
block should be as orthogonal as possible. Inspired by (U, 
rather than minimizing the inter-block coherence /j, b and the 
sub-block coherence v, we aim at minimizing the total inter- 
block coherence ^ B and the total sub-block coherence v l of 
the equivalent dictionary E. We define the total inter-block 
coherence as 

B 

a4 = ££||gmh|, do) 

and the total sub-block coherence by 

B K 

^ = £||G[j,j]||!-£(G-) 2 , (ID 

j — l m—l 

where G™ are the diagonal entries of G. The total inter-block 
coherence fx B equals the sum of the squared entries in G 
belonging to different blocks (the green entries in Fig. [TJ. 
Since this is the sum of Frobenius norms, fi B also equals 
the sum of the squared singular values of the cross-correlation 
blocks in G. When E is normalized, fi B is equivalent to the 
sum of the squared cosines of all the principal angles between 
all pairs of different blocks. The total sub-block coherence v l 
measures the sum of the squared off-diagonal entries belonging 
to the same block (the red entries in Fig. [TJ. When E is 
normalized, v l equals the sum of the squared cosines of all 
the angles between atoms within the same block. Note that 
when the size of the blocks equals one, we get v l = 0. 

Alternatively, one could define the total inter-block co- 
herence as the sum of the squared spectral norms (i.e., the 
largest singular values) of the cross-correlation blocks in G, 
and the total sub-block coherence as the sum of the squared 
maximal off-diagonal entries of the auto-correlation blocks in 
G. These definitions are closer to the ones used in condition 
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Fig. 1. A graphical depiction of the Gram matrix G of an equivalent 
dictionary E with 6 blocks of size 3. The entries belonging to different blocks 
are in green, the off-diagonal entries belonging to the same block are in red, 
and the diagonal entries are in yellow. 



®. The WCM algorithm presented in the next section can be 
slightly modified in order to minimize those measures as well. 
However, besides the increased complexity of the algorithm, 
the results appear to be inferior compared to minimizing the 
definitions ( TTOb and (fTTT i of fi B and v l . This can be explained 
by the fact that maximizing only the smallest principal angle 
between pairs of different blocks in E and maximizing the 
smallest angle between atoms within the same block, creates 
a bulk of relatively high singular values and coherence values. 
While this may improve the worst-case bound in (0, it does 
not necessarily improve the average recovery ability of block- 
sparse approximation methods. 



When minimizing the total inter-block coherence and the 
total sub-block coherence, we need to verify that the columns 
of E are normalized, to avoid the tendency of columns with 
small norm values to be underused. Rather than enforcing 
normalization strongly, we penalize for columns with norms 
that deviate from 1 by defining the normalization penalty r\\ 



K 



n 



(12) 



m— 1 



This penalty rj measures the sum of the squared distances 
between the diagonal entries in G (the yellow entries in Fig. 
Q} and 1. 



While |8|] did not deal with the block-sparse case, it is 
straightforward to see that solving is equivalent to minimiz- 
ing the sum of the normalization penalty, the total inter-block 
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coherence and the total sub-block coherence: 

B B 

we'e /in = e E wm'm \\ 2 F + E u^toi - / i 

3=1 ift 3=1 

= ^^||G[i,i]||| + ^||Gb-,i]-/| 

3=1 &j 3=1 
B B 

=E£iiGMiiF+£ii<m, 

3=1 &i 3=1 
K K 



=i 1 + li t B +v t . 

We have shown in the previous section that the objective in 
(0]i is bounded below by K — M. Therefore, 



,J\f F 



V- 



> K-M. 



(13) 



This bound implies a trade-off, and as a consequence, one 
cannot minimize rj, fj, B and v f freely. Instead, we propose 
designing a sensing matrix that minimizes the normalization 
penalty and a weighted sum of the total inter-block coherence 
and the total sub-block coherence: 

A = argmin —77 + (1 — a)(/ B + <w*, (14) 

where < a < 1 is a parameter controlling the weight 
given to the total inter-block coherence and the total sub-block 
coherence. Note that alternative objectives can be formulated. 
For example, one could add an additional weighting parameter 
to the normalization penalty term. While this would allow us 
to better control the normalization of the atoms in E, we prefer 
to deal with a single parameter only. 

When a < h, more weight is given to minimizing /j, B , 
and therefore solving ( fl4l ) leads to lower total inter-block 
coherence, which is made possible by aligning the atoms 
within each block (Fig. |2(a)| i. On the other hand, choosing 
a > i gives more weight to minimizing zA In this case, 
solving (TT~4b leads to more orthonormal blocks in E at the 
expense of higher fi B (Fig. |2(c)[ ). Finally, setting a = | in 
(fl4l i gives equal weights to fj, B , v l and 77, and reduces it to 
© (Fig. |2(b)| i. Therefore, the objective becomes independent 
of the block structure, which makes a = h the correct 
choice when the signals do not have an underlying block 
structure. Choosing to ignore the block structure leads to the 
same conclusion. When an underlying block structure exists, 
we need to select a value for a. We do that via empirical 
evaluation in Section [V] 

In the previous section we have shown that every local 
minimum of (01, and therefore also of (fPfl) with a = \, is 
also a global minimum. Empirical observations reveal that 
this is not the case when a 7^ |. This is demonstrated in 
the histograms presented in Fig. [3fa) for a = 0.01 with a 
square dictionary and in Fig. Hfb) for a = 0.99 with a highly 
overcomplete dictionary. Since it is hard to develop a closed 
form solution for (fl4l) . we present an iterative algorithm that 
converges to a local solution of ( fl4l i in the following section. 
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(a) 



(b) 



IV. Weighted Coherence Minimization 

In this section, we present the Weighted Coherence Mini- 
mization (WCM) algorithm for minimizing (fl4l) . based on the 
bound-optimization method l25ll . This algorithm substitutes 
the original objective with an easier to minimize surrogate 
objective that is updated in each optimization step. After 
defining a surrogate function and showing it can be minimized 
in closed form, we prove that its iterative minimization is 
guaranteed to converge to a local solution of the original 
problem. 




(c) 

Fig. 2. Examples of the absolute value of the Gram matrix of an equivalent 
dictionary for a = 0.01 (a), a = 0.5 (b) and a = 0.99 (c), where the sensing 
matrix of size 12 X 18 was found by solving j I4t given a randomly selected 
square dictionary composed of 6 blocks of size 3. The sub-block entries are 
highlighted by red squares. 



A. The Weighted Coherence Minimization Algorithm 

To obtain a surrogate function we rewrite the objective of 
( fl4l ). which we denote by f(G), as a function of the Gram 
matrix of the equivalent dictionary G = D'A'AD: 



f(G) =- V (G) + (1 



=2lM G )H 



a)^ t B {G)+av t {G) 
■(l-a)]K(G)||| + a|K(G)|| 



2 

F • 



where the matrix operators u^, u v and u v are defined as: 



G[i,j}% -1, i=j,m = 
0, else 

0, else 

0, else 



n; 



Fig. 3. Histograms of the objective values obtained when solving 114) 100 
times with a = 0.01 (a) and a = 0.99 (b), for a given randomly generated 
square dictionary composed of 6 blocks of size 3. The sensing matrices of 
size 12 X 18 are initialized as matrices with random entries. Note that the 
distribution is insignificant in (b), indicating that in this specific case, every 
local minimum is also a global minimum. 



Fig. 4. Histograms of the objective values obtained when solving H4\ 
100 times with a = 0.01 (a) and a = 0.99 (b), for a given randomly 
generated overcomplete dictionary composed of 24 blocks of size 3. The 
sensing matrices of size 12 X 18 are initialized as matrices with random 
entries. 



with G[i,j]™ denoting the (m, n)th entry of G[i,j]. This 
equation follows directly from the definitions of r\, /i B and 
v l . We can now write: 

f(G) = \\\G - h n (G)\\ 2 F + (1 - a)||G - / V (G)|| 2 F 

+a\\G-h v (G)\\%, (15) 
where the matrix operators h^, h v and h v are defined as: 

h v (G)[i,j]™ = 

K(G)[i,j]™ = 

Based on ( fTBI l. we define a surrogate objective g(G,G^) 
at the nth iteration as: 

g(G,G^) = l\\G- h v (G^)\\% + (l-a)\\G- h,(G^)\\ 2 F 

+a\\G - MG (n) )| 
(16) 

where G^ = D'A^'A^D is the Gram matrix of the 
equivalent dictionary from the previous iteration. In Appendix 
B, we prove that g{G 1 (jW) satisfies the conditions of a surro- 
gate objective for the bound-optimization method. Therefore, 
iteratively minimizing g(G, G^) is guaranteed to converge to 
the minimum of the original objective /(G), i.e., solve (1141 1. 



1, 


i = j, m = n; 




else 


o, 


i ^ j; 


GMn, 


else ' 


o, 


i = j,m =/= n; 


GMn, 


else 



2 

F- 
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The following proposition describes the closed form solu- 
tion to minimizing g(G,G^ n ') at each iteration. 

Proposition 1: The function g{G,G^ n ') is minimized by 
choosing 

A (n+1) = A^V^A- 1 / 2 ^', 

where UAU' is the eigenvalue decomposition of DD', Am 
and Vm are the top M eigenvalues and the corresponding M 
eigenvectors of A^^U' 'DhtiG^D'UA' 1 / 2 , and: 

ht(-) = \ Q/i„(0 + (1 - a)M0 + <*M-)) • (17) 

Proof: See Appendix B. ■ 
A summary of the proposed WCM algorithm is given below. 

Algorithm 1 Weighted Coherence Minimization 

Task: Solve for a given block-sparsifying dictionary D^xK'- 

A = argmin -77 + (1 — a)^L B + av l , 

where A e R MxN . 

Initialization: Calculate the eigenvalue decomposition of 
DD' = UAU'. Set A^ as the outcome of ©, i.e., A^ = 
[I M 0}A'^ 2 U', and n = 0. 
Repeat until convergence: 

1) Set G< n > = D'A^'A^D. 

2) Calculate h t {G {n ^) as in (O. 

3) Find the top M eigenvalues Am and the corresponding 
M eigenvectors V M of A' 1 '^' Dh t {G^)D'U A- 1 / 2 . 

4) Set = A^V M A-V2[/'. 

5) n = n + 1. 



V. Experiments 

In this section, we evaluate the contribution of the proposed 
sensing matrix design framework empirically. We compare 
the recovery and classification abilities of BOMP [22], l23l 
when using sensing matrices designed by our methods to the 
outcome of (@), which will be referred to as "Duarte-Sapiro" 
(DS) 0. 

For each simulation, we repeat the following procedure 100 
times. We randomly generate a dictionary D^ x k with nor- 
mally distributed entries and normalize its columns. In order 
to evaluate WCM on structured dictionaries as well, we repeat 
the simulations using a dictionary containing N randomly 
selected rows of the K x K Discrete Cosine Transform (DCT) 
matrix. The dictionary is divided into K/s blocks of size s. 
We then generate L = 1000 test signals X of dimension K 
that have -block-sparse representations 9 with respect to D. 
The generating blocks are chosen randomly and independently 
and the coefficients are i.i.d. uniformly distributed. Amxn is 
initialized as the outcome of DS. We find A using the WCM 
algorithm, and calculate the equivalent dictionary E = AD 
and the measurements Y = AX. Next, we obtain the block- 
sparsest representations of the measurements, 0, by applying 
BOMP with a fixed number of k nonzero blocks. 

We use two measures to evaluate the success of the simu- 
lations based on their outputs A and 0: 



The percentage of recognized generating subspaces of X 

(i.e., successful classification): r = 

where denotes element-wise multiplication. 

\\x-pe\\ F 



The normalized representation error e 



To evaluate the performance of the WCM algorithm as a 
function of a, we choose s = 3, N = 60 and K = 2N = 120. 
We repeat the experiment for both types of dictionaries, and 
for k = 1 (Fig . |5(a)|5(b)| >, k = 2 (Fig. |6(a)|6(b)| > and k = 3 
(Fig. |7(a)|7(b)j ) nonzero blocks, with respectively M = 6, 
M = 14 and M = 20 measurements. To show that the results 
remain consistent for higher values of k, we add an experiment 
with k = 6, M = 35, TV = 180 and K = 2N = 360 
(Fig. |8(a)|8(b)j ). We compare the obtained results to randomly 
set sensing matrices and to the outputs of DS j.8], based on 
the normalized representation error e, the classification success 
r, and the ratio between the total sub-block coherence and 
the total inter-block coherence v l j p l B . We observe that WCM 
and DS coincide at a = 0.5 for all the three measures, as 
expected. Note that for a < 0.5 we get that v* / ' p l B is high, 
e is high and r is low. On the other hand, when a > 0.5, 
i.e., when giving more weight to v l and less to p} B , the signal 
reconstruction as well as the signal classification are improved 
compared to DS. While the improvement for k = 1 is more 
significant, it is maintained for higher values of k as well. 
Remarkably, for structured dictionaries and for higher values 
of k, we see that a < 0.5 leads to an improvement of r. 
However, e is compromised in this case. We can conclude that 
when designing sensing matrices for block sparse decoding, 
the best results are obtained by choosing a close enough to 
1. In other words, the best recovery results are obtained when 
the equivalent dictionary has nearly orthonormal blocks. This 
holds for dictionaries containing normally distributed entries 
as well as for dictionaries containing randomly selected rows 
of the DCT matrix. As was the case in Fig. [5Jb), we observed 
empirically that for a > 0.5, every local minimum is a 
global minimum as well. This means that the WCM algorithm 
converges to a global solution of ( TPfl ) when a > 0.5, for all the 
experiments presented in this section. We emphasize however, 
that this may not be the case for other sets of parameters. 

Fig. |9(a)| and Fig. |9(b)| show that when using WCM with 
a = 0.99 on dictionaries with normally distributed entries and 
on structured dictionaries, the improvement in signal recovery 
using is maintained for a wide range of K, starting from square 
dictionaries, i.e. K = N, to highly overcomplete dictionaries. 
For this experiment, we chose s — 3, N — 60, k = 2 and 
M = 14. We note that for both types of dictionaries, the 
improvement of WCM over DS increases as the dictionary 
becomes more overcomplete. 

Finally, we show that WCM improves the results of block- 
sparse decoding for dictionaries with blocks of varying sizes 
as well. The generated dictionaries contain 15 blocks of size 
4 and 20 blocks of size 3, with N = 60 and K = 2N = 120. 
In this example, we set k = 2 and M = 14. The results are 
shown as a function of a in Fig. |10(a)| for dictionaries with 
normally distributed entries and in Fig. |10(b)| for structured 
dictionaries. 
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(a) (b) 

Fig. 5. Simulation results of sensing matrix design using the WCM algorithm 
with k = 1 and M = 6. The graphs show the normalized representation 
error e, the classification success r, and the ratio between the total sub-block 
coherence and the total inter-block coherence v l / fj, B as a function of a. In 
(a) the dictionary contains normally distributed entries, and in (b) randomly 
selected rows of the DCT matrix. 



(a) (b) 

Fig. 7. Simulation results of sensing matrix design using the WCM algorithm 
with k = 3 and M = 20. The graphs show the normalized representation 
error e, the classification success r, and the ratio between the total sub-block 
coherence and the total inter-block coherence v l I yi B as a function of a. In 
(a) the dictionary contains normally distributed entries, and in (b) randomly 
selected rows of the DCT matrix. 
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(a) (b) 

Fig. 6. Simulation results of sensing matrix design using the WCM algorithm 
with k = 2 and M = 14. The graphs show the normalized representation 
error e, the classification success r, and the ratio between the total sub-block 
coherence and the total inter-block coherence v t / fi B as a function of a. In 
(a) the dictionary contains normally distributed entries, and in (b) randomly 
selected rows of the DCT matrix. 



(a) (b) 

Fig. 8. Simulation results of sensing matrix design using the WCM algorithm 
with k = 6 and M = 35. The graphs show the normalized representation 
error e, the classification success r, and the ratio between the total sub-block 
coherence and the total inter-block coherence i/ 1 / fj, B as a function of a. In 
(a) the dictionary contains normally distributed entries, and in (b) randomly 
selected rows of the DCT matrix. 
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(a) (b) 

Fig. 9. Simulation results of sensing matrix design using the WCM algorithm 
with k = 2 and M = 14. The graphs show the normalized representation 
error e and the classification success rasa function of K. In (a) the dictionary 
contains normally distributed entries, and in (b) randomly selected rows of 
the DCT matrix. 




(a) (b) 

Fig. 10. Simulation results of sensing matrix design using the WCM 
algorithm on a dictionary containing 15 blocks of size 4 and 20 blocks of size 
3, with k = 2 and M = 14. The graphs show the normalized representation 
error e, the classification success r, and the ratio between the total sub-block 
coherence and the total inter-block coherence v l / fi* B as a function of a. In 
(a) the dictionary contains normally distributed entries, and in (b) randomly 
selected rows of the DCT matrix. 



VI. Conclusions 

In this paper, we proposed a framework for the design of a 
sensing matrix, assuming that a block-sparsifying dictionary is 
provided. We minimize a weighted sum of the total inter-block 
coherence and the total sub-block coherence, while attempting 
to keep the atoms in the equivalent dictionary as normalized as 
possible (see (fl4b). This objective can be seen as an intuitive 
extension of to the case of blocks. 

While it might be possible to derive a closed form solution 
to (TPfl i. we have presented the Weighted Coherence Minimiza- 
tion algorithm, an elegant iterative solution which is based on 
the bound-optimization method. In this method, the original 
objective is replaced with an easier to solve surrogate objective 
in each step. This algorithm eventually converges to a local 
solution of ( TL4l >. 

Simulations have shown that the best results are ob- 
tained when minimizing mostly the total sub-block coherence. 
This leads to equivalent dictionaries with nearly orthonormal 
blocks, at the price of a slightly increased total inter-block 
coherence. The obtained sensing matrix outperforms the one 
obtained when using the DS algorithm f8j to solve ©. This 
improvement manifests itself in lower signal reconstruction 
errors and higher rates of successful signal classification. 
When giving equal weight to the total inter-block coherence 
and to the total sub-block coherence, the results are identical to 
solving Moreover, both objectives coincide for this specific 
choice of a, which ignores the existence of a block structure 
in the sparse representations of the signal data. 

Appendix A 
Proof of convergence 

The surrogate function g(G, G^) has been chosen in such 
a way as to bound the original objective /(G) from above for 
every G, and to coincide at G = G^ . Minimizing g(G, G^) 
will then necessarily decrease the value of /(G): 

mm g(G,G (n) ) < 9 (GW,GW) = /(GW), 

G 

/(G (n+1) ) < 5 (G ( " +1) ,G (n) ) =min 5 (G,G (n) ). 

G 

Formally, according to ll25l . the sequence of solutions gener- 
ated by iteratively solving 

G {n+1) = argmin 5 (G,G (n) ) (18) 

G 

is guaranteed to converge to a local minimum of the origi- 
nal objective /(G) when the surrogate objective g(G,G^) 
satisfies the following three constraints: 

1) Equality at G = G< n ': 

g{G {n \G {n) ) = /(G (n) ). 

2) Upper-bounding the original function: 

g(G,GM)>f(G),VG. 

3) Equal gradient at G = G^: 

V 5 (G,G("))| G=G( „, = V/(G)| G=GM . 



We next prove that the three conditions hold. 
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Proof: Equality at G = G^: This follows from the 
definition of g(G, G (,l >). 

Upper-bounding the original function: Let us rewrite both 
functions g(G, G (n) ) and /(G) using the definition of the 
Frobenius norm: 

g(G,G^) = 

EE[^( G -MG (n) ))M™) 2 

+ (l- a )((G-MG ( " ) ))M™) 2 
+a((G-K(G^m,j}^) 2 ' ] 



and 



/(G) = 

EE 

i,j m, n 



m\2 



^ur,mu]Zr + {i-a){MG)[h3W) 



+a(u u (G)[i,j}%) 2 ] . 

The following observations prove that each of the terms in 
g(G, G'"') is larger than or equal to its counterpart in /(G), 
and therefore g(G, <?(")) > /(G): 



u v (G)[i,j}™ = 

(G-h r ,(GW))[i,j)™ = 

%(G)M™ = 

(G-MG (n) ))Mn = 
u v (G)[i,j}™ = 

(G-h v (GW))[i,j\Z = 



1, i ~ j,m = n; 
else. 



o, 

G[i,i]™-1, i = j,m = n; 
(G - G (n) )[i, j]™, else. 

0, else. 

GM™, i^j; 
(G-G<"))[i,j]™, else. 

G[i,j]™, i = j,m^n; 
0, else. 

G[i,j]™, i=j,m^n; 
(G-GM)[i,j]%, else. 

Equal gradient at G = G ( ™^: We calculate the gradient of 

ff(G,G(")) and /(G): 

Vg(G,GV>) = 
"1 
.2 l 

+a(G-^(G ( "))) 



; (G - ^(GW)) + (1 - a)(G - MG (n) )) 



-u,(G) + (1 - a)u M (G) + au u {G) 



V/(G) = 2 

When substituting G = G^"' we obtain: 

V 5 (G,G("))| G=GW =V/(G)| G=GW 

= 2(i N (G^) + (1 - aK(G^) + au,(GW)). 

Therefore, the gradients of both objectives coincide at G = 
Q( n ) _ This completes the convergence proof. ■ 



Appendix B 
Proof of Proposition 1 

Proof: In order to minimize g(G,G^ n '), we rewrite the 
problem in an alternative form: 



min^G, •) = 



mm tr 

A 



-G'G - 2G' 



-/!„(•) + (1 - a)fc M (-) + aMO 



: rmntr(£;'£:£:'£: - 2E'Eh t (-)) 
■-mm\x{EE'EE' - 2Eh t {-)E') 
:mmtr(ADD'A'ADD'A' - 2ADh t (-)D' A'), 



(19) 



where fot(-) is defined in ( TP71 ). Let UkU' be the eigenvalue 
decomposition of D£>' and define Tmxn = AUK 1 / 2 . Substi- 
tuting into ( fT9b yields: 

min g(G, •) = 

mintr(rrTr' - 2Tk- 1/2 U' Dh t (-)D'U k- 1/2 T') 



minllrT 

A 



Fi 



(20) 



where h t {-) = k~ 1 / 2 U' Dh t (-)D'U k^ 1 / 2 . According to 
the surrogate objective g(G, G^™') can be minimized in closed 
form by finding the top M components of ht{G^). Let 
A m be the top M eigenvalues of ht{G^) and Vm the 
corresponding M eigenvectors. Then, d20b is solved by setting 

1/2 

r = A A f V M . Note that this solution is not unique, since T 
can be multiplied on the left by any unitary matrix. Finally, 
the optimal sensing matrix is given by A^ n+1 ^ = TA -1 / 2 ?/' = 
A^VXfA" 1/2 f/'. The resulting Gram matrix G {n+1 *> is not 
influenced by the multiplication of on the left by a 

unitary matrix. Therefore, the WCM algorithm is not affected 
by the choice of A(™ +1 ). ■ 
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